64 research outputs found
Quality-Driven video analysis for the improvement of foreground segmentation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones.Fecha de lectura: 15-06-2018It was partially supported by the Spanish
Government (TEC2014-53176-R, HAVideo
Background initialization for the task of video-surveillance
In this work, we propose a region-wise and batch processing approach for background initialization
in video-surveillance based on a spatio-temporal analysis.
First, the related work has been explored. Then, the efforts are focused on developing a
new background initialization approach to outperform the literature performance. To this end,
a temporal analysis and a spatial analysis are performed. In the first stage, we use a previous
work techniques adding motion information to increase performance. In the second stage, a
multipath iterative reconstruction scheme is performed to build the true background under the
assumption of background smoothness, i.e. the empty scene is smoother than the scene with
foreground regions.
Finally, the results over challenging video-surveillance sequences show the quality of the
proposed approach against related work
Hierarchical improvement of foreground segmentation masks in background subtraction
A plethora of algorithms have been defined for foreground
segmentation, a fundamental stage for many computer
vision applications. In this work, we propose a post-processing
framework to improve foreground segmentation performance of
background subtraction algorithms. We define a hierarchical
framework for extending segmented foreground pixels to undetected
foreground object areas and for removing erroneously
segmented foreground. Firstly, we create a motion-aware hierarchical
image segmentation of each frame that prevents merging
foreground and background image regions. Then, we estimate
the quality of the foreground mask through the fitness of the
binary regions in the mask and the hierarchy of segmented
regions. Finally, the improved foreground mask is obtained as
an optimal labeling by jointly exploiting foreground quality and
spatial color relations in a pixel-wise fully-connected Conditional
Random Field. Experiments are conducted over four large and
heterogeneous datasets with varied challenges (CDNET2014,
LASIESTA, SABS and BMC) demonstrating the capability of the
proposed framework to improve background subtraction resultsThis work was partially supported by the Spanish Government
(HAVideo, TEC2014-53176-R
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias
Textual noise, such as typos or abbreviations, is a well-known issue that
penalizes vanilla Transformers for most downstream tasks. We show that this is
also the case for sentence similarity, a fundamental task in multiple domains,
e.g. matching, retrieval or paraphrasing. Sentence similarity can be approached
using cross-encoders, where the two sentences are concatenated in the input
allowing the model to exploit the inter-relations between them. Previous works
addressing the noise issue mainly rely on data augmentation strategies, showing
improved robustness when dealing with corrupted samples that are similar to the
ones used for training. However, all these methods still suffer from the token
distribution shift induced by typos. In this work, we propose to tackle textual
noise by equipping cross-encoders with a novel LExical-aware Attention module
(LEA) that incorporates lexical similarities between words in both sentences.
By using raw text similarities, our approach avoids the tokenization shift
problem obtaining improved robustness. We demonstrate that the attention bias
introduced by LEA helps cross-encoders to tackle complex scenarios with textual
noise, specially in domains with short-text descriptions and limited context.
Experiments using three popular Transformer encoders in five e-commerce
datasets for product matching show that LEA consistently boosts performance
under the presence of noise, while remaining competitive on the original
(clean) splits. We also evaluate our approach in two datasets for textual
entailment and paraphrasing showing that LEA is robust to typos in domains with
longer sentences and more natural context. Additionally, we thoroughly analyze
several design choices in our approach, providing insights about the impact of
the decisions made and fostering future research in cross-encoders dealing with
typos.Comment: KDD'23 conference (main research track). (*) These authors
contributed equall
Long-Term Stationary Object Detection Based on Spatio-Temporal Change Detection
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. D. Ortego, J. C. SanMiguel and J. M. Martínez, "Long-Term Stationary Object Detection Based on Spatio-Temporal Change Detection," in IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2368-2372, Dec. 2015. doi: 10.1109/LSP.2015.2482598We present a block-wise approach to detect stationary objects based on spatio-Temporal change detection. First, block candidates are extracted by filtering out consecutive blocks containing moving objects. Then, an online clustering approach groups similar blocks at each spatial location over time via statistical variation of pixel ratios. The stability changes are identified by analyzing the relationships between the most repeated clusters at regular sampling instants. Finally, stationary objects are detected as those stability changes that exceed an alarm time and have not been visualized before. Unlike previous approaches making use of Background Subtraction, the proposed approach does not require foreground segmentation and provides robustness to illumination changes, crowds and intermittent object motion. The experiments over an heterogeneous dataset demonstrate the ability of the proposed approach for short-and long-Term operation while overcoming challenging issues.This work was partially supported by the Spanish Government (HA-Video TEC2014-5317-R) and by the TEC department (UAM)
Rejection based multipath reconstruction for background estimation in video sequences with stationary objects
This is the author’s version of a work that was accepted for publication in Computer Vision and Image Understanding. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Vision and Image Understanding, VOL147 (2016) DOI 10.1016/j.cviu.2016.03.012Background estimation in video consists in extracting a foreground-free image from a set of training frames. Moving and stationary objects may affect the background visibility, thus invalidating the assumption of many related literature where background is the temporal dominant data. In this paper, we present a temporal-spatial block-level approach for background estimation in video to cope with moving and stationary objects. First, a Temporal Analysis module obtains a compact representation of the training data by motion filtering and dimensionality reduction. Then, a threshold-free hierarchical clustering determines a set of candidates to represent the background for each spatial location (block). Second, a Spatial Analysis module iteratively reconstructs the background using these candidates. For each spatial location, multiple reconstruction hypotheses (paths) are explored to obtain its neighboring locations by enforcing inter-block similarities and intra-block homogeneity constraints in terms of color discontinuity, color dissimilarity and variability. The experimental results show that the proposed approach outperforms the related state-of-the-art over challenging video sequences in presence of moving and stationary objects.This work was partially supported by the Spanish Government (HAVideo, TEC2014-53176-R) and by the TEC department (Universidad Autónoma de Madrid)
Unsupervised Contrastive Learning of Sound Event Representations
Self-supervised representation learning can mitigate the limitations in
recognition tasks with few manually labeled data but abundant unlabeled
data---a common scenario in sound event research. In this work, we explore
unsupervised contrastive learning as a way to learn sound event
representations. To this end, we propose to use the pretext task of contrasting
differently augmented views of sound events. The views are computed primarily
via mixing of training examples with unrelated backgrounds, followed by other
data augmentations. We analyze the main components of our method via ablation
experiments. We evaluate the learned representations using linear evaluation,
and in two in-domain downstream sound event classification tasks, namely, using
limited manually labeled data, and using noisy labeled data. Our results
suggest that unsupervised contrastive pre-training can mitigate the impact of
data scarcity and increase robustness against noisy labels, outperforming
supervised baselines.Comment: A 4-page version is submitted to ICASSP 202
- …